perm filename NEW.DOC[2,TES] blob sn#036550 filedate 1973-04-08 generic text, type T, neo UTF8
                                                        April 8, 1973


                  T A B L E   O F   C O N T E N T S
                  _ _ _ _ _   _ _   _ _ _ _ _ _ _ _




                               SECTION                           PAGE




I      BLOCK DIAGRAM


II     THE MANUSCRIPT

            II-A      TEXT EXPRESSIONS .  .  .  .  .  .  .  .  .  . 2

            II-B      GLYPHS  .  .  .  .  .  .  .  .  .  .  .  .  . 3

            II-C      THE DEVICE SPECIFICATION  .  .  .  .  .  .  . 3

            II-D      THE FORMATTER .  .  .  .  .  .  .  .  .  .  . 4


III    GALLEYS

            III-A     THE PAGINATOR .  .  .  .  .  .  .  .  .  .  . 6

            III-B     THE POLISHER AND THE DOCUMENT   .  .  .  .  . 7

            III-C     THE PRINTER/VIEWER  .  .  .  .  .  .  .  .  . 7


IV     THE REGISTRY

            IV-A      GLYPH FILES   .  .  .  .  .  .  .  .  .  .  . 8


V      THE MLISP EXTENSION

            V-A       ADVANTAGES AND DISADVANTAGES OF MLISP .  .   10


VI     STANDARDS


VII    REALIZATION
April 8, 1973                                       TABLE OF CONTENTS


VIII   APPENDICES


IX     SOME THOUGHTS ON STANDARD CHARACTER REPRESENTATION

            IX-A      Registry Character Representation  .  .  .   15

            IX-B      Installation Implementation  .  .  .  .  .   16


X      MATHEMATICAL NOTATION

            X-A       MATHEMATICS -- IMPLEMENTATION NOTES   .  .   19

            X-B       FONT INFORMATION STORAGE. .  .  .  .  .  .   20


XI     PROPOSAL FOR GRAPHICS LANGUAGE


XII    FIGURE 1


XIII   FIGURE 2

























                                  i
                                                        April 8, 1973







               A PROPOSAL FOR THE NEW DOCUMENT SYSTEM

             Larry Tesler, Brian Harvey, Lester Earnest,
                   Tovar Mock, and Robert Sproull





The new system has two main purposes:

(1) To provide a means for flexible production of medium-quality
documents such as technical reports, manuals, theses, and books which
may include text, line drawings, half-tone images, and mathematical
symbolism.

(2) To provide a standard representation for such documents that can
be printed or displayed on various kinds of output devices by various
kinds of computers with reasonable results.

The proposed participants in development of the new system are
Stanford University, Carnegie-Mellon University, and Xerox Palo Alto
Research Center.

This proposal was prepared by the Palo Alto Committee, consisting of
Stanford and Xerox people.  The Pittsburgh committee at CMU is
concurrently preparing its own proposal.  The two proposals shall be
exchanged as well as submitted to other interested parties for
comment, criticism, and reconciliation.
April 8, 1973


                              SECTION I
                              _______ _

                            BLOCK DIAGRAM
                            _____ _______




A block diagram of the proposed system is shown in Figure 1.  Dash-
boxes represent computer files; plus-boxes represent visible copy;
starred boxes represent programs.

The system starts with a "scribble" in an author's head or on paper.
Using a conventional TEXT EDITOR, the author prepares a "manuscript"
file encoded in a PUB-like language.  The manuscript is fed to the
FORMATTER program which produces a "galley proof".  The galley may be
printed (or displayed) by a PRINTER/VIEWER program to be proofread by
the author for errors.  To correct errors, changes are made to the
manuscript and the FORMATTER is run again.

Once an acceptable galley proof is obtained, it is fed to the
PAGINATOR and POLISHER programs which produce a "document" file.
This file may be printed (or displayed) by the PRINTER/ VIEWER
program.  Again, if errors are discovered, corrections must be made
in the manuscript and the cycle repeated.

Auxiliary programs and files that appear in the block diagram will be
explained in subsequent sections.






















                                  1
                                                        April 8, 1973


                             SECTION II
                             _______ __

                           THE MANUSCRIPT
                           ___ __________




The manuscript contains sufficient information for the system to
compute the document without human intervention.  Thus, the system is
basically non-interactive.  However, this does not preclude provision
for optional interaction at appropriate points for debugging and
advising purposes.

The manuscript is actually a computer program in the yet unnamed
language P.  P is similar to PUB except that PUB is an augmented
subset of SAIL while P is an extension of MLISP.  The complete
facilities of MLISP are available to the author, including variables,
arrays, for-statements, recursion, list structures, function
declarations, and interaction.

Among the extensions to MLISP in P are "text expressions", "math
expressions", "calligraphic expressions", "image expressions",
"portion declarations", "area declarations", and "group
declarations".



TEXT EXPRESSIONS
____ ___________


II-A.  

Text expressions are equivalent to "paragraphs" in PUB.  Every text
expression has a "class", which may be specified in the manuscript
explicitly by name or implicitly by form (cf. "AT n" in PUB).
Associated with each class are formatting procedures.  Examples of
classes might be "prose", "quotation", "table", "heading", and
"Algolprogram".

A text expression is composed of "words" and each word is composed of
"virtual glyphs" (formerly called "characters").  An example of a
virtual glyph (or "virgle") is "Small Seriph Italic Upright Black
Alpha".  A "Glyph Map" fed to the system along with the manuscript
maps virgles into "actual glyphs" or "augles".  For example, the
glyph map may say that "Small" is "8 point", "Seriph" is "Elzevir",
and "Alpha" is "Greek 101".  Or it may map all sizes into one, all
fonts into LPTFONT, and all glyph-sets into ASCII characters.


                                  2
April 8, 1973                                          THE MANUSCRIPT


The glyph map is conceptually an n-dimensional sparse array of
functions.  For example, "Large Seriph Italic A" may be specified as
appearing explicitly in a certain glyph file or may be specified as a
scale-reduction applied to an oversize glyph.



GLYPHS
______


II-B.  

Among the n coordinates that define a glyph are:

(1) Code.  An integer between 40 and 172 octal selecting a particular
character out of a character set.

(2) Set.  A set of up to 91 characters, e.g., Greek Alphabet, Math
symbols 1, Accents.

(3) Case.  Upper, Lower.  Differs only for letters in alphabets.

(4) Style.  Light, Bold, Italic, Bold Italic, Demibold, etc.

(5) Font.  Caslon, Elzevir, Times Roman, Lptfont, Datadiscfont,
JohnDoefont.

(6) Size.  Measured in Points.  The P language has point-pica-inch
conversion primitives.

(7) Orientation.  Upright or some other angle between 0 and 360
degrees.

(8) Thickness.

(9) Texture.

(10) Color.



THE DEVICE SPECIFICATION
___ ______ _____________


II-C.  

A "Device Specification" file must be fed to the system along with


                                  3
THE MANUSCRIPT                                          April 8, 1973


the manuscript and the Glyph Map.  Conceptually, the Device
Specification defines a printing or viewing device as a set of
attributes such as RASTERSCAN, 200PPI, 2FONTS, NOGRAYSCALE.
Actually, the file is a collection of MLISP DEFPROPs and procedures
through which the FORMATTER, PAGINATOR, and POLISHER programs filter
the manuscript to obtain a document that can be processed by the
PRINTER/VIEWER program for the specified device.

Keeping such procedures on a separate file (usually in LAP form for
efficiency) keeps the kernel system small even when new devices are
added to its capability.

The PRINTER/VIEWER program and the Device Specification File are
provided by each installation for each of its devices.  It may be
possible in some cases for an installation to use a single P/V and
Device Spec for several devices.  In such a case, a single document
file could be printable on all of them.



THE FORMATTER
___ _________


II-D.  

The FORMATTER program is similar to the PARSER and FILLER modules of
PUB.  The PARSER is replaced by the MLISP compiler and the LISP
system.  The FILLER is replaced by modules for text, math, line-
drawings, and images.  The pagination capabilities of PUB are
intentionally omitted to simplify the FORMATTER and to allow more
complex capabilities to be handled by the PAGINATOR program.

During operation of the FORMATTER, the author can monitor its
progress on a terminal, interrupt it at landmark points, and interact
with it at breakpoints and error points.

The FORMATTER may generate tables of contents, indices, etc.  in
manuscript format as in PUB.  If it does, it swaps in an ALPHABETIZER
program to sort the indices. Then the FORMATTER is swapped back in to
process the generated portions.

A hyphenation capability is included in the text module for those who
like it.

The manuscript is structured into one or more portions, each of which
may be divided into sections.  Non-global declarations are local to
portions and to sections (unlike PUB).  Thus, it is possible to


                                  4
April 8, 1973                                          THE MANUSCRIPT


format sections independently, but care must be taken if there are
interactions (e.g., figure numbering that does not start over at 1 in
each section).














































                                  5
                                                        April 8, 1973


                             SECTION III
                             _______ ___

                               GALLEYS
                               _______




The FORMATTER outputs two files called the "galley" and the "galley
guide" (analogous to the PUInS.PUI and the PUIn.PUI files of PUB).

The galley contains text, drawing directives, and image directives,
with sufficient information so that the Printer/Viewer program can
display it provisionally justified but not paginated.  There is a
single column for each section.  Footnotes and diagrams appear close
after the text which references them. Cross-references are not
resolved.

The galley guide is an abstract of the galley in which content is
omitted, size information is elaborated, and pagination directives
are carried forward.  The galley guide contains sufficient
information for the PAGINATOR program to lay out the document into
pages, areas, boxes, and columns.



THE PAGINATOR
___ _________


III-A.  

The PAGINATOR Program does not input the galley but only the galley
guide.  It essentially juggles rectangles and possibly other shapes
to fit them into pages, areas, and columns, keeping groups together,
placing footnotes below their rererents, and keeping figures near the
texts that describe them.

The PAGINATOR needs to know device specifications but nothing about
glyphs.  It also needs to know the author's pagination directives
from the manuscript.  These can all be found in the galley guide.

The principal output of the PAGINATOR is the "Paginated Galley
Guide".  This is probably in the same format as the Galley Guide, but
its content is sorted, structured, and pruned.

Whenever the PAGINATOR completes a page, it writes all cross-
reference labels that appeared on that page onto a file called the
"Cross-Reference Table" (CRT? -- no, XRT!).


                                  6
April 8, 1973                                                 GALLEYS


THE POLISHER AND THE DOCUMENT
___ ________ ___ ___ ________


III-B.  

Some Printer/Viewer programs may have the sophistication to be able
to input the galley, the paginated guide, and the XRT and display a
finished document (see dotted line in Figure 1).  However, the normal
procedure is to feed them to the POLISHER program which produces a
well-ordered "document" file in which pages are together and cross-
references are resolved.  This file is easily handled by the P/V.



THE PRINTER/VIEWER
___ ______________


III-C.  

This device-dependent program can print either the galley or the
polished document, becuase both files are in the same format.

For raster devices, the P/V may have two passes.  One generates bit
matrices from vector/text representations, while the other actually
prints the matrices.

The P/V program may be parametric at the option of the installation.
In certain cases, it may be possible to substitute certain fonts for
others, to change the resolution specification, or to select certain
pages for output.

The P/V is the only program that looks at the actual images of
glyphs.  These glyphs are in a form appropriate to the device, e.g.,
octal code, bit matrix, vector outline.  The actual image is normally
computed from a contour representation extracted from the Registry.














                                  7
                                                        April 8, 1973


                             SECTION IV
                             _______ __

                            THE REGISTRY
                            ___ ________




There is a Network Registry of Glyphs as well as local registries.  A
document referring to a local registry can not be transmitted over
the Network.  Use of local registries should be limited to storing
new glyphs that have not had an opportunity to be registered in the
Network Registry.

The Registry consists of a Glossary and a Directory.

The Glossary lists the available Sets, Cases, Styles, Fonts, and so
forth.  There is a procedure for adding new entries to the Glossary,
e.g., the Russian alphabet to the Set Glossary or Clarendon to the
Font Glossary.  It is also possible to add new characters to existing
incomplete sets.

The Directory lists every Glyph Files registered by a participating
installation, including its coordinates in the sparse array, complete
file name, and site name.  The coordinates must be use the
terminology of the Glossary.

It is not permissible to change a glyph file once it has been
registered in the Directory.



GLYPH FILES
_____ _____


IV-A.  

Each Network Glyph File defines up to 91 glyphs.  The file header
contains geometric information needed by the FORMATTER and POLISHER
programs, such as height, width, kerning profiles, and transformation
clues for changing scale, orientation, and thickness.  The remainder
of the file contains a curved contour representation of each glyph.

Each local installation is expected to have its own GLYPH CONVERTER
to generate local glyph files (see Figure 2).  The headers are simply
copied from Network Glyph Files, possibly changing scale,
orientation, and thickness.  The contours are converted to bit
matrices or vector outlines as appropriate.


                                  8
April 8, 1973                                            THE REGISTRY


In the case of trivial devices such as line printers, trivial glyph
files should be produced by the installation.  However, it is
important to stay within the framework of the registry.  For example,
if the LPT has an integral sign, it should be specified in the glyph
map as, say, "math-set 63" rather than as "latin-set 14".  The local
math-set glyph file would then specify that glyph 63 is really octal
14 on the LPT.  Other glyphs in the local math-set file would have no
good representation on the LPT.









































                                  9
                                                        April 8, 1973


                              SECTION V
                              _______ _

                         THE MLISP EXTENSION
                         ___ _____ _________




Several simple changes to MLISP will be made:

(1) Contraction.  Some features that would be useless to the system
and to most authors will be removed in the interest of saving space.
Authors needing these features could LAP them in.

(2) Macros.  The MLISP "DEFINE" only replaces one token by another.
Macros in P must be able to replace either an identifier or a
sequence of delimiters by an arbitrary sequence of tokens.  Invisible
tokens such as spaces, tabs, and line boundaries must be recognized
as tokens in text expressions of P.

(3) Strings.  The LISP string facilities are different in every
system and inadequate in all.  P will have its own string package
with a few primitives to be encoded in LAP for each object machine.
A string will be a series of glyphs; thus, the package would compute
widths and heights of text units such as words at high speed.



ADVANTAGES AND DISADVANTAGES OF MLISP
__________ ___ _____________ __ _____


V-A.  

Among the advantages of an MLISP implementation of the new system
are:

(1) Efficiency.  The language will be processed by an extension of
the existing MLISP compiler, which translates at 3000 lines per
minute, more than three times faster than PUB Pass One.  Most PUB
macros could be procedures (EXPRs and FEXPRs) in P, so their
execution will be several times faster than in PUB (PUB spends much
of its time expanding macros).

(2) Flexibility.  Author procedures could directly call or redefine
procedures in the system.  During debugging, the author could set
breakpoints and perform traces.

(3) Portability.  The extended MLISP compiler will be written mostly


                                 10
April 8, 1973                                     THE MLISP EXTENSION


in STANDARD LISP, so that it will be transportable to new
installations with a minimum of effort.

The system should run equally well (except for speed differences) in
LISP1.6, TENEX-LISP, ILSP, MACLISP, and LISP70.  With a small amount
of LAP programming, it should run in LISPs on other computers than
the PDP-10 as well.

Disadvantages of MLISP are:

(1) Size.  The LISP1.6 version of the FORMATTER will probably be
nearly as large as PUB Pass One, becase of LISP and MLISP overhead.
This will be remedied when LISP70 is operational.

(2) Inefficiency.  The PAGINATOR and POLISHER may be simple enough to
be programmed in machine language at a substantial gain in
efficiency.  This may be done after portable LISP versions are
operational.































                                 11
                                                        April 8, 1973


                             SECTION VI
                             _______ __

                              STANDARDS
                              _________




The following file formats shall be standardized:
(1) Individual Documents

        a. Manuscript.
        b. Galley and Document (same format).
        c. Cross-Reference Table.
        d. Galley Guide.
        e. Paginated Galley Guide (similar to d?).

(2) Registry

        a. Glossary
        b. Directory
        c. Glyph File Header
        d. Curved Contour Representation

The following programs shall be written in portable fashion:

(1) FORMATTER

(2) PAGINATOR

(3) POLISHER



















                                 12
April 8, 1973


                             SECTION VII
                             _______ ___

                             REALIZATION
                             ___________




Manuscript and Registry standards shall be proposed by Palo Alto and
Galley and Document standards by Pittsburgh.

The FORMATTER shall be programmed by Rich Johnson and Brian Harvey
with assistance by Larry Tesler.

The PAGINATOR and POLISHER shall be programmed at CMU.

MLISP extensions shall be made at Stanford.

The ILSP implementation will be maintained by CMU, the LISP1.6 (and
later LISP70) implementations by Stanford, and the TENEX-LISP
implementation by Xerox.

Each installation shall provide its own glyph converters, text
editors, device specifications, and printer/viewers.  However, the
possibility of collaborating on XGP service should be explored as the
project proceeds.  CMU shall be the motivating force and shall do
most of the programming.

A target date of August 15 is suggested for a first version of the
system.  Although only a subset will be implemented in the first
version, the framework for supplying the remainder must be provided.

This optimistic estimate is based on the fact that PUB was completed
in six months by one person in an inappropriate language.  The new
implementation is simplified by separating pagination from filling
and by building on an existing compiler.  Although the new system has
many sophisticated facilities, they have all been done before in some
form by some of the implementors.












                                 13
                                                        April 8, 1973


                            SECTION VIII
                            _______ ____

                             APPENDICES
                             __________




Included for completeness are memos by Dan Swinehart on the registry,
by Brian Harvey on math, by Bob Sproull on graphics.  Unfortunately,
Sproull's document is not machine-readable, so only an abstract
appears here.

It should be noted that these documents were preapared before the
above committee report.  Therefore many points have been incorporated
into the report or rejected by the committee.


































                                 14
April 8, 1973


                             SECTION IX
                             _______ __

         SOME THOUGHTS ON STANDARD CHARACTER REPRESENTATION
         ____ ________ __ ________ _________ ______________



D. Swinehart -- 30 March, 1973
Reference: CHARAC.PRO[ESS,JMC] -- also a NIC document, don't know #




Registry Character Representation
________ _________ ______________


IX-A.  

1. Assume an arbitrary sized addressing space, say 100 bits (large
enough).  The Registry (official character specification for all
              ________
recognized characters) is therefore going to be sparse, and will have
to be represented in some complex structure.  Each entry in the
Registry is a character description, expressed in some accurate way.

2. Any set of entries which are logically related (a "character_set")
                                                      ______________
will want to occupy consecutive locations in  the address space --
high order bits identify which character set ("font").

3. Any character set which represents a font of the "standard"
character set (96-char ASCII or whatever), or would have some reason
to want to map onto that character set (e.g., Greek, Cyrillic, some
other script) should be arranged so that each entry's low order 7
bits (say) is the ASCII for the character it represents.  Or
something like that.

4. Some number of high order bits, perhaps leaving several above the
standard 7 or 8 for expansion of the basic set, can be officially
designated font_bits, applying to character sets as described in (3).
           _________
Others could be designated "nofont" bits, representing specific
unrelated graphics with no direct ASCII mappings.

5. Additional bits, I guess, could be designated to scaling,
rotating, and slanting fields, for those character-machine
combinations where graphics must be hand tuned when new size or
distortion characteristics are introduced.  It would be better, I
think, if these fields were left out of the Registry, and introduced
into specific machine-dependent representations, since some
implementations will be able to compute scaled and tilted characters
from their normal specification.

                                 15
SOME THOUGHTS ON STANDARD CHARACTER REPRESENTATION      April 8, 1973


6. If a field designated to a purpose (font, etc.) overflows,
additional unused bits can be assigned to extend it -- they need not
be contiguous.  (Larry Tesler's more structured suggestions,
assigning to each character a set of property-value attributes,
avoids some of this -- I've stayed pretty primitive for reasons I
don't entirely understand).

7. The first few character addresses (say 0001 - 0111) will not be
assigned any graphics.  They are reserved as special control
characters (see below).

Translation_Specifications (optional)
__________________________

1. A given installation may decide that certain of the high order
bits are much more common than others.  To get compact file
representations, they would like to shuffle things so that these bits
reside just "above" the basic character-set bits.

2. To do this, each file can specify (at its beginning or in
additional attributes) a translation rule taking
normalized_characters (see below) in the file to registry characters.
_____________________
A text file coming over the net might be translated twice (from
remote translation to registry, from registry to local translation)
before being stored.  The translation rule specifies the largest
character size (in bits), MAXCH, which an untranslated (file)
character will attain.

3. Each installation can specify a default translation.



Installation Implementation
____________ ______________


IX-B.  

1. There is no mention in the above of a standard byte size or the
equivalent.  The installation is free to choose any byte size it
wishes, as long as it is >4 (or so).  However, 7 bits is about
minimum for a reasonable representation.

2. A character representation is simply enough bytes to represent the
largest normalized character.  We'll call this number n, where
                                                      _
n=MAXCH/bytesize.  Part of the Translation specification's job is to
_
distribute parts of the registry character representations such that
reasonable things fall into reasonable bytes.



                                 16
April 8, 1973      SOME THOUGHTS ON STANDARD CHARACTER REPRESENTATION


3. Nobody wants to treble or quadruple the size of his file just to
get all these features, so we want a way to distribute parts of
characters which will remain constant over large segments of a file.
The following special characters (which will be recognized no matter
what the prefix) will be interpreted as commands:

  1 -- prefix -- the next byte is a byte count, b. The byte after
       ______                                   _
        that is a byte index, i.  The next b bytes will replace the
                              _            _
        i-b+1th to ith bytes of the current prefix.
        _ _        ___

  2 -- charsize -- the next byte contains the size, c, in bytes, of
       ________                                     _
        each subsequent file character.  A normalized_character
                                           ____________________
        is then usually obtained by concatenating the current prefix
        to the next c bytes in the file.  There should be a system
                    _
        standard prefix, with i=n-1, b=n-1, c=1.
                              _ _    _ _    _

  3 -- escapeset -- the next byte is m, the size of an escape
       _________                     _
        character --  the default m is n.
                                  _    _

  4 -- escape -- the next m bytes form a full specification for the
       ______             _
        desired normalized character.


A registry character can then be formed from each normalized one, or
the device-dependent character specifications can just be stored in
normalized form.

Often people will want to override a prefix for some period of time,
then return to a previous setting.  This nesting can be provided by
commands at this level, or left to higher levels (like PUB).

At a given site, there will be processors (compilers, assemblers)
which, at least at first, will not want to handle the full generality
of this design.  If the design were adopted, these processors would
have to be modified just a bit.  They should be able to get away with
simply recognizing and ignoring all the control commands, including
prefixes, treating all characters as if they were standard font
Ascii.











                                 17
                                                        April 8, 1973


                              SECTION X
                              _______ _

                        MATHEMATICAL NOTATION
                        ____________ ________




by Brian Harvey

The linear typein of mathematical displays requires a wide variety of
commands to be accepted, for different formatting operations, e.g.,
subscript.  This variety seems to me to preclude the use of single-
or double-character commands; instead, word commands like SUB for
subscript should be used.  This means that some escape convention
must be provided to make PUB or its successor distinguish command
words from text.

At Composition Technology, we had two notations for handling this
problem.  Individual command words were preceded by an escape
character (we used @), and for cases like mathematics where many
commands would be used in a row, a line starting with a tab was
considered to be all commands.  The latter notation is clearly
inappropriate for PUB, but some character sequence analogous to curly
brackets could be used to bracket math-style commands.  It would
probably be a good idea to define a single-command escape like @ as
well.

By convention, a one-letter "command" is taken at CTI to mean that
the letter should be printed in italic.  This works out very nicely
for math, because most variables are normally italicized.  Thus, to
print

           iπ
          e   + 1 = 0

a CTI typist would type

                     display e sup i pi base+1=0 dpyend

(Within a display, spaces are typed only to separate command words
and are otherwise ignored.  The spacing of the display is controlled
by the computer.)

Most display formatting commands come in bracketing pairs, like
display...dpyend and sup...base in the example above.  This notation
is somewhat more verbose than necessary, but has the advantage that
inner operations can be closed automatically by an outer operation's


                                 18
April 8, 1973                                   MATHEMATICAL NOTATION


terminator provided that the two operations are of different types;
also, ample error and warning messages are possible.  For certain
formats with relatively simple contents, macros with a simpler format
can be defined.  For example, to get a case fraction like the 1/2 on
a typewriter, the canonical syntax is "case 1 csden 2 csend";
however, a standard macro "cfract (1/2)" is provided.

Some examples of formatting operators besides those already mentioned
are FAB for "function abbreviation" as in fab(cos) (abbreviations
like cos for fab(cos) would be standardly provided),
div...den...divend for a stacked fraction ("den" is for denominator),
coef...coden...coend for binomial coefficient, barovr...barend and
barudr...barend, plus some having to do with more global formatting
like dpyno...dnoend for a display number to be printed at the margin.
A more complicated problem is a matrix, which would include row and
column operators to separate and position the cells.  I have a
complete list of the operations used at CTI, but it seems pointless
to include it in a document like this.

Typesetting mathematics also requires decisions to be made about the
representation of characters in different fonts, etc.  This problem
is addressed below.



MATHEMATICS -- IMPLEMENTATION NOTES
___________ __ ______________ _____


X-A.  

Unfortunately, it seems unlikely that the mathematics processor can
be written completely independently of the text processor.  For one
thing, mathematical equations are sometimes found within a line of
text (this situation is hereafter called a DIT for "display-in-
text"), and the text line might have to be broken within the
equation.  Therefore the text processor needs not just an "atomic"
string representing the equation, but a good deal of break-precedence
information within the dit.  Also, a great deal of low-level code
could be shared by all sections of the program; for example, the math
part and the graphics part both need to draw vectors.  This means
that the entire program will have to be collectively designed in some
detail before people can go off and do their part.

One question which must be answered is the degree of sophistication
required in handling formatting problems.  For example, when
parentheses are to be used around some tall expression like a stacked
fraction, there are several ways the size of the parens can be
determined:

                                 19
MATHEMATICAL NOTATION                                   April 8, 1973


1.  There can be only two sizes, regular and big (say, 10 and 20
point), and the user can type (...) or obgpar...cbgpar as desired.

2.  There can be an explicit size operator, say "size(#) (" where #
is the desired size in suitable units.

3.  The program can recursively typeset the stuff inside the parens
and then go back and figure out how big to make the parens.

Of course, one can also imagine some combination of these with manual
override to an automatic calculation, etc.  The advantages of #3
should be obvious.  The disadvantages include the rather intricate
recursion problems (remember that parentheses are sometimes
unbalanced, so in the general case a backtracking procedure is
needed!), the difficulty of scaling characters on raster hardware,
and the slowing down of an already slow program.  Also, as a matter
of aesthetics, the quantization of paren sizes is not obvious.
Infinitely variable height to match the stuff inside would make each
set of parens look funny by comparison with a possibly slightly
different set nearby.

Another formatting decision concerns the question of line breaking
within a long display.  The ultimate thing would be for the program
to decide where to break the line (a function of mathematical meaning
and distance into the line) and also how to align the two parts
vertically.  At the opposite extreme both line breaking and alignment
could be manually controlled.  If the interactive editor with TV
display is really going to happen, the latter possibility is not as
bad as it sounds.



FONT INFORMATION STORAGE.
____ ___________ ________


X-B.  

(Editor's Note: This plan was revised extensively by the Palo Alto
Committee.  See main proposal.)

In my opinion the proposed master font registry should not be the
source of font information for production programs.  Instead, there
should be a Font Information File, in a standardized format, to
describe those characters actually used for a particular job.  This
file need not contain the actual character generation information,
but merely certain dimensional information and a device-specific
pointer to find the character for output.  One reason for this is


                                 20
April 8, 1973                                   MATHEMATICAL NOTATION


that in the interests of efficiency programs shouldn't have to dig
through an immense file each time a job is run.  (These FIFs would
undoubtedly survive many typesetting runs.) Another is that people
with limited hardware, e.g., a line printer, shouldn't have to
register the nonexistent generational data for their one and only
font in order to be able to use the system; instead, they need only
generate a trivial FIF.  Even for non-trivial devices, if we have,
e.g., a raster device and the registry standard is oriented to vector
devices, we can have our own FIF pointing directly to a raster
description of the char rather than having to generate it each time
from the vector description.

In the CTI system, the full name of a character has four parts: font,
style, overlay, and char code.  The font code represents things like
Times or Garamond; the style code is for italic or boldface, the
overlay indicates a set of chars like greek, math, or accents; and
the char code is a 7-bit code to determine the exact character.  The
full name of a character is 18 bits long, but in files, a condensed
notation is used: the font and style are globally set by escape code
sequences like @I for italic, overlay 1 chars (more or less the same
as ASCII) are simply represented by their 7-bit code, and other chars
are represented by two bytes, one for the overlay and one for the
char.  This uses the low, non-printing ASCII codes for overlay codes
(and for overlay-independent chars like fixed spaces), which makes
for a problem at Stanford, but it's not as bad as it might be since,
as will be seen shortly, people never actually type overlay codes.

The font code is not tied to an "absolute" font like Times!!!!!
Instead, font 1 is, say, a serif font, font 2 is script, and font 3
sans-serif.  This way, files may contain commands like @SANSER for
sans-serif, and to change fonts all you have to do is use a different
FIF, which controls the relative to absolute font conversion.  In
fact, in theory none of the char name is absolute in this sense--
systematic errors in interpreting a manuscript character can be
corrected by producing a nonstandard font file in which the bad name
is changed to indicate the good character.

Another sort of name posessed by a character is its mnemonic.  This
allows the typesetting of non-keyboard characters by saying @INT for
integral, etc.  At CTI the mnemonic represents only the overlay and
char codes, so one can have light and bold INTs.  (This is especially
important for accents, which come in all fonts and styles.) This is
what prevents users from having to type overlay codes explicitly.
Thus in Stanford's case it would be possible to accept a Stanford-
ASCII source file and put out a real-ASCII-compatible output.

The information in a FIF must include, for each char, its 18-bit


                                 21
MATHEMATICAL NOTATION                                   April 8, 1973


name, its mnemonic, its "absolute" name and/or a pointer into the
registry, a pointer to the device-specific generation data, its point
size, width, height above and below the baseline, accent class and
math class (see below), kerning profile (ditto), and possibly a see-
below modifications field.  If the FIF is stored in binary for
efficient use, there should also be a standard ASCII representation
and translators should be written.

The math class of a character is a number representing its
mathematical meaning, e.g., binary operator, integral sign, open or
close fence (like parens).  The accent class, for a non-accent,
distinguishes between lower case letters, undotted i or j, and
everything else.  It also may indicate that the char should be
treated as italic even though it isn't, as for certain greek letters
which take italic accents.  (Phi does, pi doesn't.)  For an accent,
it indicates which of the above types the accent goes with, and also
whether it is an above accent (circumflex), a below accent (cedilla),
or a superposed accent (slash or bar as through an h for h/2π).

The kerning profile has to do with the problem of putting characters
next to each other.  Generally a character can be considered as a
rectangle which is partly inked in, and the char's dimensions are
those of the rectangle.  Consider an italic lower case f, in a word
like "food."  The top of the f must overlap the box containing the o
in order not to have too much space between the letters.  Generally
this is done by understating the width of the f in the FIF.  Now
suppose we want an italic f followed by a close paren.  The problem
will be that the top of the f will overlap the paren.  To avoid this
the program must know something about the shape of the characters,
but preferably not as much as a full description because the
computation of each kerning problem would then be incredibly
expensive.  At CTI we divide each rectangle vertically into 6 pieces
of equal height (the height thus depends on the total char height),
and for each compute a 3-bit representation of the extent to which
the ink extends past (or doesn't fill) the box at each end.  This
adds up to 6*2*3=36 bits of profile info.  There is also a kerning
problem in the other direction, as to typeset L**2.  The little 2
should really be somewhat inside the L's box.

The character modification feature was developed at CTI because our
hardware made new character generation difficult and costly.
Sometimes we had problems which could be solved merely by
repositioning or changing the scale of an existing char; for example,
to get a center asterisk like * from one in superscript (footnote)
position, one simply puts a vertical drop before the asterisk and a
corresponding rise after it.  Compound chars like ≤ can also be made
this way, treating one part as a superposed accent positioned over


                                 22
April 8, 1973                                   MATHEMATICAL NOTATION


the other.  In a situation like ours where it is easier to make new
chars, this might not be required, but on the other hand it might be
easier for a user than finding Tovar.  The idea is to have the FIF
entry describe the "target" graphic, and not until the last stage
does the computer discover that it has to modify the "source"
graphic.  This feature, if used, would best be allowed to be
nonstandard in its details so that individual installations could
provide those facilities present in the hardware.









































                                 23
                                                        April 8, 1973


                             SECTION XI
                             _______ __

                   PROPOSAL FOR GRAPHICS LANGUAGE
                   ________ ___ ________ ________




by Robert Sproull

This is an editor's abstract of a typewritten document.

A document is composed of "boxes" with geometry, marked where page
breaks can occur.  Each box has a "body" and "i.d. info".  The body
has printing rules.  The i.d. has names for subtitling and
positioning relative to other boxes,.  Processing within each box is
independent, allowing for incremental compilation of a document.

LISP procedures are more useful than macros, e.g., to specify line
drawings in the graphics section.

Line-drawing primitives are suggested: absolute/relative point/line,
line or curve with thickness and texture, string (caption), device-
dependent code.

Floating-point coordinate system chosen by user.

Curves in terms of endpoints and control points.  Latter not
necessarily on the curve, but guide fitter.

Program must be able to interrogate the state, including questions
like "How many inches would a vector of length dx,dy occupy?".  Other
questions: resolution, string dimensions, aspect ratio.

A display procedure (cf. Newman, CACM) has arguments, prog variables,
and also a "master rectangle" within which it can draw.  A display
procedure call may optionally specify the instance rectangle, as well
as location, rotation, scale, and transform matrix.  The system
automatically applies these transformations from the user's
coordinate system to the page.

Display procedure calls draw within a "box" of given size as
mentioned earlier.







                                 24
April 8, 1973


                             SECTION XII
                             _______ ___

                              FIGURE 1
                              ______ _



BLOCK DIAGRAM -- PART 1 OF 2

               +++++++++++
              |  SCRIBBLE |
               +++++++++++
                    |
                    ∨
               ***********
              |TEXT EDITOR|
               ***********
                    |
                    ∨
               -----------
              |           |
              | MANUSCRIPT|
              |           |
               -----------
                    |
                    |<-------------------------
 +++++++++          ∨                          |
|         |    ***********      ************   |
| MONITOR |<--| FORMATTER |--->|ALPHABETIZER|--
|         |    ***********      ************
 +++++++++     |        |
               ∨        ∨
    ------------    ----------
   |            |  |          |
   |GALLEY GUIDE|  |  GALLEY  |
   |            |  |          |
    ------------    ---------- 













                                 25
FIGURE 1                                                April 8, 1973


BLOCK DIAGRAM -- PART 2 OF 2

    ------------    ----------
   |            |  |          |
   |GALLEY GUIDE|  |  GALLEY  |-----
   |            |  |          |     |
    ------------    ----------      |
             |                      |
             ∨                      |
            ***********             |
           | PAGINATOR |            |
            ***********             |
             |       |              |
             ∨       ∨              |
     -----------   -----------      |
    | PAGINATED | |   CROSS   |     |
    |  GALLEY   | | REFERENCE |     |
    |   GUIDE   | |   TABLE   |     |
     -----------   -----------      |
             |           |        -----
             |           |       |     |
             ∨           ∨       ∨     |
             ---------------------     |
                 |             .       |
                 ∨             .       |
            ***********        .       |
           |  POLISHER |       .       |
            ***********        .       |
                 |             .       |
                 ∨             .       |
            -----------        .       |                  +++++++++
           |           |       ∨       ∨   *********     |HARD COPY|
           |  DOCUMENT |------------------| PRINTER |--->|   OR    |
           |           |                  | /VIEWER |    | DISPLAY |
            -----------                    *********      +++++++++














                                 26
April 8, 1973


                            SECTION XIII
                            _______ ____

                              FIGURE 2
                              ______ _



GLYPH CONVERTER

         ----------
        |          |
        | REGISTRY |
        |          |
         ----------
             |
             ∨
         ***********
        | CONVERTER |
         ***********
          |       |
          ∨       ∨
 ------------    ---------
|  GLYPH     |  |  GLYPH  |
|DESCRIPTIONS|  | IMAGES  |
 ------------    ---------

























                                 27